Search CORE

Reconciliation Revisited: Handling Multiple Optima when Reconciling with Duplication, Transfer, and Loss

Author: A. Tofigh
A.J. Vilella
B. Sennblad
C. Chauve
C. Conow
C.E.V. Storm
E.V. Koonin
F. Rutschmann
I. Wapinski
J.-P. Doyon
J.G. Burleigh
K.Y. Gorbunov
L.A. David
M. Charleston
M. Goodman
M. Stolzer
M.D. Rasmussen
M.S. Bansal
P. Bonizzoni
P. Górecki
R. Heijden van der
R.D.M. Page
Y. Ovadia
Z.Z. Chen
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2013
Field of study

Phylogenetic tree reconciliation is a powerful approach for inferring evolutionary events like gene duplication, horizontal gene transfer, and gene loss, which are fundamental to our understanding of molecular evolution. While duplication–loss (DL) reconciliation leads to a unique maximum-parsimony solution, duplication-transfer-loss (DTL) reconciliation yields a multitude of optimal solutions, making it difficult to infer the true evolutionary history of the gene family. This problem is further exacerbated by the fact that different event cost assignments yield different sets of optimal reconciliations. Here, we present an effective, efficient, and scalable method for dealing with these fundamental problems in DTL reconciliation. Our approach works by sampling the space of optimal reconciliations uniformly at random and aggregating the results. We show that even gene trees with only a few dozen genes often have millions of optimal reconciliations and present an algorithm to efficiently sample the space of optimal reconciliations uniformly at random in O(mn[superscript 2]) time per sample, where m and n denote the number of genes and species, respectively. We use these samples to understand how different optimal reconciliations vary in their node mappings and event assignments and to investigate the impact of varying event costs. We apply our method to a biological dataset of approximately 4700 gene trees from 100 taxa and observe that 93% of event assignments and 73% of mappings remain consistent across different multiple optima. Our analysis represents the first systematic investigation of the space of optimal DTL reconciliations and has many important implications for the study of gene family evolution.National Science Foundation (U.S.) (CAREER Award 0644282)National Institutes of Health (U.S.) (Grant RC2 HG005639)National Science Foundation (U.S.). Assembling the Tree of Life (Program) (Grant 0936234

eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges

Author: A. Roth
Altenhoff
C. von Mering
Chen
Chen
Ciccarelli
Creevey
D. Szklarczyk
Eisen
Gabaldon
Hulsen
I. Letunic
J. Muller
K. Trachana
Koonin
Kuzniar
L. J. Jensen
Linard
M. Kuhn
Makarova
Milinkovitch
P. Bork
Pearson
R. Arnold
S. Powell
T. Doerks
T. Rattei
Tatusov
Tatusov
Trachana
van der Heijden
von Mering
Wapinski
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNOG v2. The new release is the result of a number of improvements and expansions: (i) the underlying homology searches are now based on the SIMAP database; (ii) the orthologous groups have been extended to 41 levels of selected taxonomic ranges enabling much more fine-grained orthology assignments; and (iii) the newly designed web page is considerably faster with more functionality. In total, eggNOG v3 contains 721 801 orthologous groups, encompassing a total of 4 396 591 genes. Additionally, we updated 4873 and 4850 original COGs and KOGs, respectively, to include all 1133 organisms. At the universal level, covering all three domains of life, 101 208 orthologous groups are available, while the others are applicable at 40 more limited taxonomic ranges. Each group is amended by multiple sequence alignments and maximum-likelihood trees and broad functional descriptions are provided for 450 904 orthologous groups (62.5%)

University of Birmingham Research Portal

Copenhagen University Research Information System

ZORA

MDC Repository

Evaluating Ortholog Prediction Algorithms in a Yeast Model Clade

Author: A Alexeyenko
A Goffeau
A Kuzniar
A Rokas
AM Altenhoff
Antonis Rokas
B Dujon
BN Kent
C Dessimoz
C Vogel
Cecile Fairhead
CEV Storm
CEV Storm
CM Zmasek
CP Kurtzman
DA Fitzpatrick
DP Mindell
DP Wall
DP Wall
DR Scannell
EV Koonin
EV Koonin
EW Sayers
F Chen
F Chen
F Lemoine
FS Dietrich
I Wapinski
I Wapinski
J Ehrlich
JC Chiu
JL Gordon
K Liolios
KH Wolfe
KP Byrne
KP O'Brien
L Li
L Salichos
LA Mirny
LB Koski
Leonidas Salichos
M Kellis
M Remm
MP Cummings
O Akerborg
P Bork
P Bork
P Cliften
R Overbeek
RL Tatusov
RL Tatusov
RR Sokal
S Grossetete
SF Altschul
SF Altschul
T Hulsen
TF DeLuca
V van Noort
WM Fitch
Publication venue: Public Library of Science
Publication date: 13/04/2011
Field of study

RSD, respectively, so that they can predict orthologs across multiple taxa) against a set of 2,723 groups of high-quality curated orthologs from 6 Saccharomycete yeasts in the Yeast Gene Order Browser. of all algorithms dramatically increased in these traps.) for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs (e.g., molecular phylogenetics), but that all algorithms fail to accurately predict orthologs when paralogy is rampant

Public Library of Science (PLOS)

Public Library of Science (PLOS)

Gene Expression Divergence is Coupled to Evolution of DNA Structure in Coding Regions

Author: A Heger
A Mazin
AG Pedersen
B Lemos
BJ Venters
BY Liao
C Ueguchi
CJ McManus
D Wang
DK Pokholok
DS Goodsell
Eran Segal
FC Holstege
GC Liao
I Artsimovitch
I Tirosh
I Tirosh
I Tirosh
I Wapinski
IK Jordan
J Perez-Martin
JA Greenbaum
JH Bullard
JK Choi
JK Choi
JM Ranz
K Florquin
KD Makova
L David
M Friedel
MA Harris
MA Sartor
MV Rockman
N Kaplan
P Baldi
P Khaitovich
PJ Wittkopp
PJ Wittkopp
PM Sharp
R Hermsen
R Rohs
R Rohs
R Shalgi
RE Dickerson
SC Parker
SV Nuzhdin
SW Doniger
T Abeel
TD Yager
TR O'Connor
W Lee
WK Olson
Xianhua Dai
Y Field
Z Hu
Zhiming Dai
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Sequence changes in coding region and regulatory region of the gene itself (cis) determine most of gene expression divergence between closely related species. But gene expression divergence between yeast species is not correlated with evolution of primary nucleotide sequence. This indicates that other factors in cis direct gene expression divergence. Here, we studied the contribution of DNA three-dimensional structural evolution as cis to gene expression divergence. We found that the evolution of DNA structure in coding regions and gene expression divergence are correlated in yeast. Similar result was also observed between Drosophila species. DNA structure is associated with the binding of chromatin remodelers and histone modifiers to DNA sequences in coding regions, which influence RNA polymerase II occupancy that controls gene expression level. We also found that genes with similar DNA structures are involved in the same biological process and function. These results reveal the previously unappreciated roles of DNA structure as cis-effects in gene expression

CiteSeerX

Enrichment of homologs in insignificant BLAST hits by co-complex network alignment

Author: A Bateman
A Ruepp
B Snel
Berend Snel
EV Koonin
HW Mewes
I Wapinski
J Boekhorst
J Espadaler
J Soding
JB Pereira-Leal
Jos Boekhorst
KP Byrne
L Fokkens
L Li
L Matthews
Like Fokkens
M Ashburner
M Boube
M Campillos
M Kroiss
M Remm
P Smits
R Singh
R Szklarczyk
RA Notebaart
S Bandyopadhyay
Sandra MC Botelho
SF Altschul
SF Altschul
T Gabaldon
T Hubbard
Y Chen
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Homology is a crucial concept in comparative genomics. The algorithm probably most widely used for homology detection in comparative genomics, is BLAST. Usually a stringent score cutoff is applied to distinguish putative homologs from possible false positive hits. As a consequence, some BLAST hits are discarded that are in fact homologous. Results Analogous to the use of the genomics context in genome alignments, we test whether conserved functional context can be used to select candidate homologs from insignificant BLAST hits. We make a co-complex network alignment between complex subunits in yeast and human and find that proteins with an insignificant BLAST hit that are part of homologous complexes, are likely to be homologous themselves. Further analysis of the distant homologs we recovered using the co-complex network alignment, shows that a large majority of these distant homologs are in fact ancient paralogs. Conclusions Our results show that, even though evolution takes place at the sequence and genome level, co-complex networks can be used as circumstantial evidence to improve confidence in the homology of distantly related sequences.</p

Springer - Publisher Connector

Queen's University Belfast Research Portal

Using RNA-seq to determine the transcriptional landscape and the hypoxic response of the pathogenic yeast Candida parapsilosis

Author: A Kuberl
A Lupetti
A Sellam
A Sellam
A Tavanti
A Vik
AD Giusani
Alessandro Guida
AP Jackson
AP Silva
B Dujon
BA Lasker
BA Weissenmayer
BB Tuch
BG Oliver
BS Davies
BS Davies
C Askew
C Notredame
C Trapnell
C Trapnell
CB Nielsen
Chen Ding
Claudia Lindstädt
D Lin
D Mattanovich
D Parkhomchuk
D Trofa
DA Fitzpatrick
Desmond G Higgins
DJ Diekema
DM Kuhn
EC van Asbeck
EC van Asbeck
ER Setiadi
G Butler
Geraldine Butler
GK Smyth
H Li
I Milne
I Wapinski
I Wapinski
J Bonhomme
JE Stajich
JH Bullard
JM Synnott
K De Schutter
K Rutherford
LY Zhang
MA Pfaller
MA Pfaller
Matthew Berriman
MD Wilkerson
ME Logue
MJ Hickman
N Rhind
Nicola J Corton
PM Dennison
PM Silver
QM Mitrovich
QM Mitrovich
RS Zitomer
S MacPherson
S Sai
Sarah L Maguire
SF Welbel
SW Roy
SW Roy
SW Roy
T Doedt
T Rossignol
T Rossignol
TA Clark
TJ Lott
TM Lowe
VM Bruno
X Hong
Y Benjamini
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background <it>Candida parapsilosis </it>is one of the most common causes of <it>Candida </it>infection worldwide. However, the genome sequence annotation was made without experimental validation and little is known about the transcriptional landscape. The transcriptional response of <it>C. parapsilosis </it>to hypoxic (low oxygen) conditions, such as those encountered in the host, is also relatively unexplored. Results We used next generation sequencing (RNA-seq) to determine the transcriptional profile of <it>C. parapsilosis </it>growing in several conditions including different media, temperatures and oxygen concentrations. We identified 395 novel protein-coding sequences that had not previously been annotated. We removed > 300 unsupported gene models, and corrected approximately 900. We mapped the 5' and 3' UTR for thousands of genes. We also identified 422 introns, including two introns in the 3' UTR of one gene. This is the first report of 3' UTR introns in the Saccharomycotina. Comparing the introns in coding sequences with other species shows that small numbers have been gained and lost throughout evolution. Our analysis also identified a number of novel transcriptional active regions (nTARs). We used both RNA-seq and microarray analysis to determine the transcriptional profile of cells grown in normoxic and hypoxic conditions in rich media, and we showed that there was a high correlation between the approaches. We also generated a knockout of the <it>UPC2 </it>transcriptional regulator, and we found that similar to <it>C. albicans</it>, Upc2 is required for conferring resistance to azole drugs, and for regulation of expression of the ergosterol pathway in hypoxia. Conclusion We provide the first detailed annotation of the <it>C. parapsilosis </it>genome, based on gene predictions and transcriptional analysis. We identified a number of novel ORFs and other transcribed regions, and detected transcripts from approximately 90% of the annotated protein coding genes. We found that the transcription factor Upc2 role has a conserved role as a major regulator of the hypoxic response in <it>C. parapsilosis </it>and <it>C. albicans</it>.</p

Springer - Publisher Connector

eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations

Author: A. Roth
Altschul
Aurrecoechea
Berglund
C. von Mering
D. Szklarczyk
Datta
Edgar
Eyre
Felsenstein
Finn
Fitch
Gilbert
Guindon
Harris
Hubbard
Huerta-Cepas
I. Letunic
J. Muller
Jensen
Jensen
Kanehisa
Katoh
Koonin
Kriventseva
Kuhn
Kuzniar
L. J. Jensen
Letunic
Letunic
Li
Loytynoja
M. Kuhn
Makarova
P. Bork
P. Julien
Pruitt
Roth
S. Powell
Saebo
Sonnhammer
Swarbreck
T. Doerks
Tatusov
Tatusov
Thompson
Thompson
Uchiyama
van der Heijden
Vilella
Wapinski
Waterhouse
Zmasek
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering. We applied this procedure to 630 complete genomes (529 bacteria, 46 archaea and 55 eukaryotes), which is a 2-fold increase relative to the previous version. The pipeline yielded 224 847 OGs, including 9724 extended versions of the original COG and KOG. We computed OGs for different levels of the tree of life; in addition to the species groups included in our first release (i.e. fungi, metazoa, insects, vertebrates and mammals), we have now constructed OGs for archaea, fishes, rodents and primates. We automatically annotate the non-supervised orthologous groups (NOGs) with functional descriptions, protein domains, and functional categories as defined initially for the COG/KOG database. In-depth analysis is facilitated by precomputed high-quality multiple sequence alignments and maximum-likelihood trees for each of the available OGs. Altogether, eggNOG covers 2 242 035 proteins (built from 2 590 259 proteins) and provides a broad functional description for at least 1 966 709 (88%) of them. Users can access the complete set of orthologous groups via a web interface at: http://eggnog.embl.de

Copenhagen University Research Information System

UCL Discovery

ZORA

MDC Repository

Network Evolution: Rewiring and Signatures of Conservation in Signaling

The analysis of network evolution has been hampered by limited availability of protein interaction data for different organisms. In this study, we investigate evolutionary mechanisms in Src Homology 3 (SH3) domain and kinase interaction networks using high-resolution specificity profiles. We constructed and examined networks for 23 fungal species ranging from Saccharomyces cerevisiae to Schizosaccharomyces pombe. We quantify rates of different rewiring mechanisms and show that interaction change through binding site evolution is faster than through gene gain or loss. We found that SH3 interactions evolve swiftly, at rates similar to those found in phosphoregulation evolution. Importantly, we show that interaction changes are sufficiently rapid to exhibit saturation phenomena at the observed timescales. Finally, focusing on the SH3 interaction network, we observe extensive clustering of binding sites on target proteins by SH3 domains and a strong correlation between the number of domains that bind a target protein (target in-degree) and interaction conservation. The relationship between in-degree and interaction conservation is driven by two different effects, namely the number of clusters that correspond to interaction interfaces and the number of domains that bind to each cluster leads to sequence specific conservation, which in turn results in interaction conservation. In summary, we uncover several network evolution mechanisms likely to generalize across peptide recognition modules

CiteSeerX

Public Library of Science (PLOS)